[SPARK-51008][SQL] Add ResultStage for AQE #49715

liuzqt · 2025-01-28T19:50:59Z

What changes were proposed in this pull request?

Added ResultQueryStageExec for AQE

How does the query plan look like in explain string:

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   ResultQueryStage 2 ------> newly added
   +- *(5) Project [id#26L]
      +- *(5) SortMergeJoin [id#26L], [id#27L], Inner
         :- *(3) Sort [id#26L ASC NULLS FIRST], false, 0
         :  +- AQEShuffleRead coalesced
         :     +- ShuffleQueryStage 0
         :        +- Exchange hashpartitioning(id#26L, 200), ENSURE_REQUIREMENTS, [plan_id=247]
         :           +- *(1) Range (0, 25600, step=1, splits=10)
         +- *(4) Sort [id#27L ASC NULLS FIRST], false, 0
            +- AQEShuffleRead coalesced
               +- ShuffleQueryStage 1
                  +- Exchange hashpartitioning(id#27L, 200), ENSURE_REQUIREMENTS, [plan_id=257]
                     +- *(2) Ran...

No change Spark UI since we ignore ResultQueryStage just like we did for other query stages.

Why are the changes needed?

Currently AQE framework is not fully self-contained since not all plan segments can be put into a query stage: the final "stage" basically executed as a nonAQE plan. This PR added a result query stage for AQE to unify the framework. With this change, we can build more query stage level features, one use case like #44013 (comment)

Does this PR introduce any user-facing change?

NO

How was this patch tested?

new unit tests.

Also exisiting tests which are impacted by this change are updated to keep their original test semantics.

Was this patch authored or co-authored using generative AI tooling?

NO

liuzqt · 2025-02-04T00:12:52Z

@cloud-fan

sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

cloud-fan · 2025-02-04T05:08:49Z

cc @ulysses-you

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

ulysses-you

so this pr is just one of the stage level feature prs ?

cloud-fan · 2025-02-05T07:35:37Z

@ulysses-you yes, after this PR, we can implement the proposed idea in #44013 (comment) and keep contexts in the AQE query stage.

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala

…e/AdaptiveSparkPlanExec.scala Co-authored-by: Wenchen Fan <[email protected]>

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

cloud-fan

LGTM except for some style nits

ulysses-you · 2025-02-11T03:48:35Z

shall we ignore ResultQeryStage in spark ui like other query stage ?

cloud-fan · 2025-02-11T03:51:50Z

I think we already did it for all query stages. @liuzqt how did you see result query stage in the UI?

ulysses-you · 2025-02-11T03:52:25Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala

+  }
+
+  // Result stage could be any SparkPlan, so we don't have a specific runtime statistics for it.
+  override def getRuntimeStatistics: Statistics = Statistics(sizeInBytes = 0, rowCount = None)


I'm concerned about sizeInBytes = 0, the Spark dummy statistics in other code place is Long.MaxValue. Shall we use Statistics.DUMMY ?

Make sense, use Statistics.DUMMY instead.

liuzqt · 2025-02-12T05:23:31Z

I think we already did it for all query stages. @liuzqt how did you see result query stage in the UI?

I think we need to explicitly match the name to ignore it (updated in this commit)

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

…e/AdaptiveSparkPlanExec.scala

cloud-fan · 2025-02-12T09:25:27Z

thanks, merging to master/4.0!

### What changes were proposed in this pull request? Added ResultQueryStageExec for AQE How does the query plan look like in explain string: ``` AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == ResultQueryStage 2 ------> newly added +- *(5) Project [id#26L] +- *(5) SortMergeJoin [id#26L], [id#27L], Inner :- *(3) Sort [id#26L ASC NULLS FIRST], false, 0 : +- AQEShuffleRead coalesced : +- ShuffleQueryStage 0 : +- Exchange hashpartitioning(id#26L, 200), ENSURE_REQUIREMENTS, [plan_id=247] : +- *(1) Range (0, 25600, step=1, splits=10) +- *(4) Sort [id#27L ASC NULLS FIRST], false, 0 +- AQEShuffleRead coalesced +- ShuffleQueryStage 1 +- Exchange hashpartitioning(id#27L, 200), ENSURE_REQUIREMENTS, [plan_id=257] +- *(2) Ran... ``` How does the query plan look like in Spark UI: <img width="680" alt="Screenshot 2025-02-03 at 4 11 43 PM" src="https://github.com/user-attachments/assets/86946e19-ffdd-42dd-974a-62a8300ddac8" /> ### Why are the changes needed? Currently AQE framework is not fully self-contained since not all plan segments can be put into a query stage: the final "stage" basically executed as a nonAQE plan. This PR added a result query stage for AQE to unify the framework. With this change, we can build more query stage level features, one use case like #44013 (comment) ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? new unit tests. Also exisiting tests which are impacted by this change are updated to keep their original test semantics. ### Was this patch authored or co-authored using generative AI tooling? NO Closes #49715 from liuzqt/SPARK-51008. Lead-authored-by: liuzqt <[email protected]> Co-authored-by: Ziqi Liu <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 207390b) Signed-off-by: Wenchen Fan <[email protected]>

draft

2f1669e

github-actions bot added the SQL label Jan 28, 2025

fix

5bfb8e5

liuzqt force-pushed the SPARK-51008 branch from 08df46a to 5bfb8e5 Compare January 31, 2025 19:50

liuzqt added 2 commits January 31, 2025 16:10

fix tests

6e1fd83

fix test

4251762

liuzqt changed the title ~~[SPARK-51008][SQL][WIP] Add ResultStage for AQE~~ [SPARK-51008][SQL] Add ResultStage for AQE Feb 3, 2025

cloud-fan reviewed Feb 4, 2025

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 4, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Show resolved Hide resolved

cloud-fan reviewed Feb 4, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 4, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

cloud-fan mentioned this pull request Feb 4, 2025

[SPARK-46090][SQL] Support plan fragment level SQL configs in AQE #44013

Closed

update

f13d11d

liuzqt requested a review from cloud-fan February 4, 2025 19:34

Merge remote-tracking branch 'upstream/master' into SPARK-51008

14f4ba8

ulysses-you reviewed Feb 5, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Show resolved Hide resolved

ulysses-you reviewed Feb 5, 2025

View reviewed changes

cloud-fan reviewed Feb 5, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 5, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 5, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 5, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 5, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala Outdated Show resolved Hide resolved

liuzqt and others added 4 commits February 5, 2025 11:05

Update sql/core/src/main/scala/org/apache/spark/sql/execution/adaptiv…

eb2875b

…e/AdaptiveSparkPlanExec.scala Co-authored-by: Wenchen Fan <[email protected]>

Update sql/core/src/main/scala/org/apache/spark/sql/execution/adaptiv…

1ad4061

…e/AdaptiveSparkPlanExec.scala Co-authored-by: Wenchen Fan <[email protected]>

minor

915bf39

refactor createQueryStages

4248e55

liuzqt force-pushed the SPARK-51008 branch from 82a5871 to 4248e55 Compare February 6, 2025 00:32

liuzqt added 2 commits February 6, 2025 17:35

refactor back

7ba69f4

minor

1f376c3

liuzqt requested a review from cloud-fan February 7, 2025 02:41

liuzqt added 2 commits February 7, 2025 11:59

fix

ec58426

dereference the result from result stage

0ab3e20

liuzqt force-pushed the SPARK-51008 branch from 65048a1 to 0ab3e20 Compare February 8, 2025 18:31

cloud-fan reviewed Feb 10, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Show resolved Hide resolved

cloud-fan reviewed Feb 11, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 11, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 11, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

cloud-fan approved these changes Feb 11, 2025

View reviewed changes

ulysses-you reviewed Feb 11, 2025

View reviewed changes

liuzqt added 2 commits February 11, 2025 17:55

nit

c8e25e0

hide result query stage in Spark UI

138c46c

github-actions bot added the WEB UI label Feb 12, 2025

use Statistics.DUMMY

9b1cd00

liuzqt requested a review from ulysses-you February 12, 2025 05:25

cloud-fan reviewed Feb 12, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

Update sql/core/src/main/scala/org/apache/spark/sql/execution/adaptiv…

cab59fe

…e/AdaptiveSparkPlanExec.scala

cloud-fan approved these changes Feb 12, 2025

View reviewed changes

ulysses-you approved these changes Feb 12, 2025

View reviewed changes

cloud-fan closed this in 207390b Feb 12, 2025

nartal1 mentioned this pull request Mar 25, 2025

[BUG] Spark-4.0: Tests failures in AdaptiveQueryExecSuite NVIDIA/spark-rapids#12006

Closed

nartal1 mentioned this pull request Apr 25, 2025

[FEA][AUDIT][SPARK-51008][SQL] Add ResultStageExec for AQE - Databricks 13.3 issue NVIDIA/spark-rapids#12585

Open

[SPARK-51008][SQL] Add ResultStage for AQE #49715

[SPARK-51008][SQL] Add ResultStage for AQE #49715

Conversation

liuzqt commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

liuzqt commented Feb 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloud-fan commented Feb 4, 2025

Uh oh!

Uh oh!

ulysses-you left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Feb 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Uh oh!

ulysses-you commented Feb 11, 2025

Uh oh!

cloud-fan commented Feb 11, 2025

Uh oh!

ulysses-you Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

liuzqt Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

liuzqt commented Feb 12, 2025

Uh oh!

Uh oh!

cloud-fan commented Feb 12, 2025

Uh oh!

Uh oh!

liuzqt commented Jan 28, 2025 •

edited

Loading